Data MiningText MiningWeb Mining
Megaputer Intelligence home
Download Software Technical Support Shopping Cart
Data Mining


What is data mining?

Data Mining is the process of extracting knowledge hidden from large volumes of raw data.

The importance of collecting data that reflect your business or scientific activities to achieve competitive advantage is widely recognized now. Powerful systems for collecting data and managing it in large databases are in place in all large and mid-range companies. However, the bottleneck of turning this data into your success is the difficulty of extracting knowledge about the system you study from the collected data.

Human analysts with no special tools can no longer make sense of enormous volumes of data that require processing in order to make informed business decisions. Data mining automates the process of finding relationships and patterns in raw data and delivers results that can be either utilized in an automated decision support system or assessed by a human analyst.

  • What goods should be promoted to this customer?

  • What is the probability that a certain customer will respond to a planned promotion?

  • Can one predict the most profitable securities to buy/sell during the next trading session?

  • Will this customer default on a loan or pay back on schedule?

  • What medical diagnose should be assigned to this patient?

  • How large the peak loads of a telephone or energy network are going to be?

  • Why the facility suddenly starts to produce defective goods?

These are all the questions that can probably be answered if information hidden among megabytes of data in your database can be found explicitly and utilized. Modeling the investigated system, discovering relations that connect variables in a database are the subject of data mining.

Modern computer data mining systems self learn from the previous history of the investigated system, formulating and testing hypotheses about the rules which this system obeys. When concise and valuable knowledge about the system of interest had been discovered, it can and should be incorporated into some decision support system which helps the manager to make wise and informed business decisions.

top

Why use data mining?

Data might be one of the most valuable assets of your corporation - but only if you know how to reveal valuable knowledge hidden in raw data. Data mining allows you to extract diamonds of knowledge from your historical data and predict outcomes of future situations. It will help you optimize your business decisions, increase the value of each customer and communication, and improve satisfaction of customer with your services.

Data that require analysis differ for companies in different industries. Examples include:

  • Sales and contacts histories
  • Call support data
  • Demographic data on your customers and prospects
  • Patient diagnoses and prescribed drugs data
  • Clickstream and transactional data from your website

In all these cases data mining can help you reveal knowledge hidden in data and turn this knowledge into a crucial competitive advantage. Today increasingly more companies acknowledge the value of this new opportunity and turn to Megaputer for leading edge data mining tools and solutions that help optimizing their operations and increase your bottom line.

top

What can Data Mining do for you?

  • Identify your best prospects and then retain them as customers.
    By concentrating your marketing efforts only on your best prospects you will save time and money, thus increasing effectiveness of your marketing operation.
  • Predict cross-sell opportunities and make recommendations.
    Whether you have a traditional or web-based operation, you can help your customers quickly locate products of interest to them - and simultaneously increase the value of each communication with your customers.
  • Learn parameters influencing trends in sales and margins.
    You think you could do this with your OLAP tools? True, OLAP can help you prove a hypothesis - but only if you know what questions to ask in the first place. In the majority of cases you have no clue on what combination of parameters influences your operation. In these situations data mining is your only real option.
  • Segment markets and personalize communications.
    There might be distinct groups of customers, patients, or natural phenomena that require different approaches in their handling. If you have a broad customer range, you would need to address teenagers in California and married homeowners in Minnesota with different products and messages in order to optimize your marketing campaign.
top

Reasons for the growing popularity of Data Mining

Growing Data Volume

The main reason for necessity of automated computer systems for intelligent data analysis is the enormous volume of existing and newly appearing data that require processing. The amount of data accumulated each day by various business, scientific, and governmental organizations around the world is daunting. According to information from GTE research center, only scientific organizations store each day about 1 TB (terabyte!) of new information. And it is well known that academic world is by far not the leading supplier of new data. It becomes impossible for human analysts to cope with such overwhelming amounts of data.

Limitations of Human Analysis

Two other problems that surface when human analysts process data are the inadequacy of the human brain when searching for complex multifactor dependencies in data, and the lack of objectiveness in such an analysis. A human expert is always a hostage of the previous experience of investigating other systems. Sometimes this helps, sometimes this hurts, but it is almost impossible to get rid of this fact.

Low Cost of Machine Learning

One additional benefit of using automated data mining systems is that this process has a much lower cost than hiring an army of highly trained (and payed) professional statisticians. While data mining does not eliminate human participation in solving the task completely, it significantly simplifies the job and allows an analyst who is not a professional in statistics and programming to manage the process of extracting knowledge from data.

top

Tasks Solved by Data Mining

Predicting

A task of learning a pattern from examples and using the developed model to predict future values of the target variable.

The following PolyAnalyst engines help solving this task:

PolyNet Predictor, Find Laws, Memory Based Reasoning, and Linear Regression

Classification

A task of finding a function that maps records into one of several discrete classes.

The following PolyAnalyst engines help solving this task:

Memory Based Reasoning, Classify and Discriminate

Detection of relations

A task of searching for the most influential independent variables for a selected target variable.

Find Dependencies engine help solving this task

Explicit modeling

A task of finding explicit formulae describing dependencies between various variables.

PolyAnalyst Find Laws engine help solving this task

Clustering

A task of identifying groups of records that are similar between themselves but different from the rest of the data. Often, the variables providing the best clustering should be identified as well.

PolyAnalyst Cluster engine help solving this task

Market Basket Analysis

Processing transactional data in order to find those groups of products that are sold together well. One also searches for directed association rules identifying the best product to be offered with a current selection of purchased products.

PolyAnalyst Market Basket Analysis engine help solving this task

Deviation Detection

A task of determining the most significant changes in some key measures of data from previous or expected values.

top

Different DM Technologies and Systems

It would be very instructive to discuss various existing approaches to data mining while stressing out the following three vital criteria:

  • Control of the significance of obtained results

  • Transparity of developed empirical models and their interpretability

  • Degree of search process automatisation and ease-of-use

To build a bridge from more traditional methods for data analysis to data mining methods we start by discussing some more traditional approaches:

And then proceed to consider Data Mining methods:

Back to top

  Read more about PolyAnalyst

Company  | Products  | Services  | Press Line  | Investors  | Technology

©2004 Megaputer Intelligence. All rights reserved.